# Midscene.js - joyful automation by AI

Open-source AI Operator for Web, Mobile App, Automation & Testing

## Features

### Write automation with natural language
- Describe your goals and steps, and Midscene will plan and operate the user interface for you.
- Use Javascript SDK or YAML to write your automation script.

### Web or mobile app
- **Web Automation**: Either [integrate with Puppeteer](./integrate-with-puppeteer), [with Playwright](./integrate-with-playwright) or use [Bridge Mode](./bridge-mode-by-chrome-extension) to control your desktop browser.
- **Android Automation**: Use [Javascript SDK](./integrate-with-android) with adb to control your local Android device.
- **iOS Automation**: Use [Javascript SDK](./integrate-with-ios) with WebDriverAgent to control your local iOS device.

### Tools
- **Visual Reports for Debugging**: Through our test reports and Playground, you can easily understand, replay and debug the entire process.
- [**Caching for Efficiency**](./caching): Replay your script with cache and get the result faster.
- **MCP**: Allows other MCP Clients to directly use Midscene's capabilities. [**Web MCP**](./web-mcp) [**Android MCP**](./mcp-android)

### Three kinds of APIs
- [Interaction API](./api#interaction-methods): interact with the user interface.
- [Data Extraction API](./api#data-extraction): extract data from the user interface and dom.
- [Utility API](./api#more-apis): utility functions like `aiAssert()`, `aiLocate()`, `aiWaitFor()`.


## Showcases

We've prepared some showcases for you to learn the use of Midscene.js.

1. Use JS code to drive task orchestration, collect information about Jay Chou's concert, and write it into Google Docs (By UI-TARS model)

<video src="https://github.com/user-attachments/assets/75474138-f51f-4c54-b3cf-46d61d059999" height="300" controls />

2. Control Maps App on Android (By Qwen-2.5-VL model)

<video src="https://github.com/user-attachments/assets/1f5bab0e-4c28-44e1-b378-a38809b05a00" height="300" controls />

3. Using midscene mcp to browse the page (https://www.saucedemo.com/), perform login, add products, place orders, and finally generate test cases based on mcp execution steps and playwright example

<video src="https://github.com/user-attachments/assets/a95ca353-e50c-4091-85ba-e542f576b6be" height="300" controls />


## Zero-code quick experience

- **[Chrome Extension](./quick-experience)**: Start in-browser experience immediately through [the Chrome Extension](./quick-experience), without writing any code.
- **[Android Playground](./quick-experience-with-android)**: There is also a built-in Android playground to control your local Android device.
- **[iOS Playground](./quick-experience-with-ios)**: There is also a built-in iOS playground to control your local iOS device.

## Model choices

Midscene.js supports both multimodal LLMs like `gpt-4o`, and visual-language models like `Qwen2.5-VL`, `Doubao-1.5-thinking-vision-pro`, `gemini-2.5-pro` and `UI-TARS`. 

Visual-language models are recommended for UI automation.

Read more about [Choose a model](./choose-a-model)

## Two styles of automation

### Auto planning

Midscene will automatically plan the steps and execute them. It may be slower and heavily rely on the quality of the AI model.

```javascript
await aiAction('click all the records one by one. If one record contains the text "completed", skip it');
```

### Workflow style

Split complex logic into multiple steps to improve the stability of the automation code.

```javascript
const recordList = await agent.aiQuery('string[], the record list')
for (const record of recordList) {
  const hasCompleted = await agent.aiBoolean(`check if the record ${record}" contains the text "completed"`)
  if (!hasCompleted) {
    await agent.aiTap(record)
  }
}
```

> For more details about the workflow style, please refer to [Blog - Use JavaScript to Optimize the AI Automation Code](./blog-programming-practice-using-structured-api)


## Comparing to other projects

There are so many UI automation tools out there, and each one seems to be all-powerful. What's special about Midscene.js?

* **Debugging Experience**: You will soon realize that debugging and maintaining automation scripts is the real challenge. No matter how magical the demo looks, ensuring stability over time requires careful debugging. Midscene.js offers a visualized report file, a built-in playground, and a Chrome Extension to simplify the debugging process. These are the tools most developers truly need, and we're continually working to improve the debugging experience.

* **Open Source, Free, Deploy as you want**: Midscene.js is an open-source project. It's decoupled from any cloud service and model provider, you can choose either public or private deployment. There is always a suitable plan for your business.

* **Integrate with Javascript**: You can always bet on Javascript 😎

## Resources

* Home Page and Documentation: [https://midscenejs.com](https://midscenejs.com/)
* Sample Projects: [https://github.com/web-infra-dev/midscene-example](https://github.com/web-infra-dev/midscene-example)
* API Reference: [https://midscenejs.com/api.html](./api)
* GitHub: [https://github.com/web-infra-dev/midscene](https://github.com/web-infra-dev/midscene)

## Community

* [Discord](https://discord.gg/2JyBHxszE4)
* [Follow us on X](https://x.com/midscene_ai)
* [Lark Group(飞书交流群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=291q2b25-e913-411a-8c51-191e59aab14d)


## Credits

We would like to thank the following projects:

- [Rsbuild](https://github.com/web-infra-dev/rsbuild) and [Rslib](https://github.com/web-infra-dev/rslib) for the build tool.
- [UI-TARS](https://github.com/bytedance/ui-tars) for the open-source agent model UI-TARS.
- [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL) for the open-source VL model Qwen2.5-VL.
- [scrcpy](https://github.com/Genymobile/scrcpy) and [yume-chan](https://github.com/yume-chan) allow us to control Android devices with browser.
- [appium-adb](https://github.com/appium/appium-adb) for the javascript bridge of adb.
- [appium-webdriveragent](https://github.com/appium/WebDriverAgent) for the javascript operate XCTest。
- [YADB](https://github.com/ysbing/YADB) for the yadb tool which improves the performance of text input.
- [Puppeteer](https://github.com/puppeteer/puppeteer) for browser automation and control.
- [Playwright](https://github.com/microsoft/playwright) for browser automation and control and testing.

## License

Midscene.js is [MIT licensed](https://github.com/web-infra-dev/midscene/blob/main/LICENSE).
